Skip to content

Conversation

@dtcxzyw
Copy link
Owner

@dtcxzyw dtcxzyw commented Jun 14, 2025

Link: llvm/llvm-project#125935
Requested by: @dtcxzyw

@github-actions github-actions bot mentioned this pull request Jun 14, 2025
@dtcxzyw
Copy link
Owner Author

dtcxzyw commented Jun 14, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@f46c44d
patch: llvm/llvm-project#125935
sha256: 81fa43d512d684ba2ff1da4eac0b25dc14bc129d700c58cf7b68c1484db0534b
commit: 82b3348

15 files changed, 2395 insertions(+), 2192 deletions(-)

Improvements:
  early-cse.NumCSE 5803139 -> 5803163 +0.00%
  licm.NumHoisted 5629421 -> 5629439 +0.00%
  scalar-evolution.NumExitCountsComputed 4284852 -> 4284860 +0.00%
  loop-vectorize.LoopsAnalyzed 2062805 -> 2062808 +0.00%
  instcombine.NumCombined 131973741 -> 131973841 +0.00%
  dse.NumRemainingStores 47489716 -> 47489720 +0.00%
  inline.NumInlined 44574719 -> 44574720 +0.00%
Regressions:
  abstract-call-sites.NumDirectAbstractCallSites 9869383 -> 9869382 -0.00%

24 24 bench/boost/optimized/dump_ssse3.ll
6 6 bench/libquic/optimized/poly1305_vec.ll
4 4 bench/libwebp/optimized/lossless_enc_sse2.ll
1 1 bench/libwebp/optimized/lossless_sse2.ll
2 2 bench/llama.cpp/optimized/ggml-cpu-quants.ll
253 257 bench/ncnn/optimized/imreadwrite.ll
78 78 bench/node/optimized/simdutf.ll
8 8 bench/ocio/optimized/Lut1DOpCPU_SSE2.ll

@github-actions
Copy link
Contributor

Here is a high-level summary of the most significant changes in the provided LLVM IR diff:

  1. Addition of nuw and nsw flags to vector add instructions
    Multiple <16 x i8> and <4 x i32> addition operations have been annotated with nuw nsw (no unsigned wrap, no signed wrap), indicating that the additions are assumed not to overflow in either direction. This enables more aggressive optimizations by informing the compiler that wrapping behavior can be ignored.

  2. Use of exact flag for vector lshr (logical shift right) instructions
    In several SSE2/SSE4-based functions, lshr operations on <4 x i32> vectors were updated with the exact keyword. This indicates that the shift is known to produce the exact same result when performed with fewer bits, allowing simplifications or strength reductions during optimization.

  3. Replacement of generic or with or disjoint
    The patch replaces many standard or instructions with or disjoint, particularly on <2 x i64> and <16 x i16> types. This conveys that the two operands do not overlap in set bits, which allows better optimization opportunities such as replacing ORs with ADDs where safe.

  4. Use of zext nneg instead of plain zext or sext
    Some zero-extend (zext) operations now include the nneg (non-negative) attribute. This indicates that the sign bit of the input is zero, offering additional information for optimization. Additionally, in some cases, sext was replaced with zext, suggesting reinterpretation of values as non-negative.

  5. Reordering and restructuring of shuffle/select patterns in SSE2 code
    In the stbi__idct_simdPhiPs function (part of an IDCT transform), there's a major reorganization of shufflevector patterns and regrouping of computations. While functionality remains similar, this change likely improves register usage or aligns data more efficiently for downstream operations.


These updates reflect targeted optimizations for vectorized code across multiple benchmarks, primarily focusing on improving safety assumptions (e.g., overflow, sign) to enable further backend optimizations and better use of instruction semantics for SIMD execution.

model: qwen-plus-latest
CompletionUsage(completion_tokens=488, prompt_tokens=63820, total_tokens=64308, completion_tokens_details=None, prompt_tokens_details=None)

@dtcxzyw dtcxzyw closed this Jun 14, 2025
@dtcxzyw dtcxzyw deleted the test-run15650641857 branch June 16, 2025 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant